question: Why is the availability of real-world datasets essential for advancing machine learning research and development? option 1: Real-world datasets are easier to model and capture imperfections option 2: Synthetic datasets cannot provide initial insights option 3: Real-world imperfections are hard to model and capture in synthetic datasets option 4: Real-world datasets have smaller data distribution shifts 